Goto

Collaborating Authors

 conditional image generation




Conditional Image Generation with PixelCNN Decoders

Neural Information Processing Systems

This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerful decoder in an image autoencoder. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.


Unsupervised Learning of Object Landmarks through Conditional Image Generation

Neural Information Processing Systems

We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the two examples differ by a viewpoint change and/or an object deformation. In order to factorize appearance and geometry, we introduce a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features. Compared to standard image generation problems, which often use generative adversarial networks, our generation task is conditioned on both appearance and geometry and thus is significantly less ambiguous, to the point that adopting a simple perceptual loss formulation is sufficient. We demonstrate that our approach can learn object landmarks from synthetic image deformations or videos, all without manual supervision, while outperforming state-of-the-art unsupervised landmark detectors. We further show that our method is applicable to a large variety of datasets - faces, people, 3D objects, and digits - without any modifications.




Review for NeurIPS paper: ContraGAN: Contrastive Learning for Conditional Image Generation

Neural Information Processing Systems

Reviewers were split on this paper with three recommending accept and one recommending reject. The main concerns were missing experiments on ImageNet and lack of clarify on why the method should work, particularly with regard to how it stabilizes training. After the rebuttal, the reviewers and AC were more confident in the experimental results and recommend acceptance, but the authors are urged to 1) complete the full experiments on ImageNet, 2) analyze stability over multiple runs and provide some discussion of why the proposed method should help stability. Also please see the other detailed recommendations in the reviews.


Reviews: Conditional Image Generation with PixelCNN Decoders

Neural Information Processing Systems

The paper solves a significant problem in generative modeling and the paper is quite interesting. However, reviewer feels the current version is not polished well due to several issues in the experimental section. For rebuttal, please focus on the (*), (**), (***) and (***) mentioned in the following paragraphs. Reviewer is willing to change the score if all the concerns are addressed in the rebuttal. Novelty: The proposed model is technically novel in the sense that it explores the conditional modeling with the recent pixel (R/C)NN framework.


ContraGAN: Contrastive Learning for Conditional Image Generation

Neural Information Processing Systems

Conditional image generation is the task of generating diverse images using class label information. Although many conditional Generative Adversarial Networks (GAN) have shown realistic results, such methods consider pairwise relations between the embedding of an image and the embedding of the corresponding label (data-to-class relations) as the conditioning losses. In this paper, we propose ContraGAN that considers relations between multiple image embeddings in the same batch (data-to-data relations) as well as the data-to-class relations by using a conditional contrastive loss. The discriminator of ContraGAN discriminates the authenticity of given samples and minimizes a contrastive objective to learn the relations between training images. Simultaneously, the generator tries to generate realistic images that deceive the authenticity and have a low contrastive loss.


Reviews: Unsupervised Learning of Object Landmarks through Conditional Image Generation

Neural Information Processing Systems

Summary: This paper proposes a method for conditional image generation by jointly learning "structure" points such as face and body landmarks. The authors propose to use a convolutional neural network with a modified loss to capture the image transformation and landmarks. They evaluate their approach on a set of datasets including CelebA, VoxCeleb, and Human 3.6M. Positive: -The problem addressed is an important problem and the authors attempt to solve it using a well engineered approach. Negatives: -The pre-processing using heat maps, normalizing them into probabilities, then using a gaussian kernel to produce the features is a bit heuristic.